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Parameter estimation in astrophysics often requires the use of com- 
plex physical models. In this paper we study the problem of estimating 
the parameters that describe star formation history (SFH) in galaxies. 
Here, high-dimensional spectral data from galaxies are appropriately mod- 
eled as linear combinations of physical components, called simple stellar 
populations (SSPs), plus some nonlinear distortions. Theoretical data for 
each SSP is produced for a fixed parameter vector via computer modeling. 
Though the parameters that define each SSP are continuous, optimizing 
the signal model over a large set of SSPs on a fine parameter grid is com- 
putationally infeasible and inefficient. The goal of this study is to estimate 
the set of parameters that describes the SFH of each galaxy. These target 
parameters, such as the average ages and chemical compositions of the 
galaxy's stellar populations, are derived from the SSP parameters and the 
component weights in the signal model. Here, we introduce a principled 
approach of choosing a small basis of SSP prototypes for SFH parameter 
estimation. The basic idea is to quantize the vector space and effective 
support of the model components. In addition to greater computational 
efficiency, we achieve better estimates of the SFH target parameters. In 
simulations, our proposed quantization method obtains a substantial im- 
provement in estimating the target parameters over the common method 
of employing a parameter grid. Sparse coding techniques are not appropri- 
ate for this problem without proper constraints, while constrained sparse 
coding methods perform poorly for parameter estimation because their 
objective is signal reconstruction, not estimation of the target parameters. 
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1. Introduction. In astronomy and cosmology one is often challenged by 
the complexity of the relationship between the physical parameters to be 
estimated and the distribution of the observed data. In a typical application 
the mapping from the parameter space to the observed data space is built 
on sophisticated physical theory or simulation models or both. These scien- 
tifically motivated models are growing ever more complex and nuanced as 
a result of both increased computing power and improved understanding of 
the underlying physical processes. At the same time, data are progressively 
more abundant and of higher dimensionality as a result of more sophisti- 
cated detectors and greater data collection capacity. These challenges create 
opportunities for statisticians to make a large impact in these fields. 

In this paper we address one such challenge in the field of astrophysics. 
Informally, the setup can be described as follows. The observed data vector 
from each source is appropriately modeled as a constrained linear combina- 
tion of a set of physical components, plus some nonlinear distortion and noise 
to account for observational effects. Call this the signal model. One also has 
a computer model capable of generating a dictionary of physical components 
under different settings of the physical parameters. Using this dictionary of 
components, the signal model can be fitted to observed data. The parame- 
ters of interest — which we will refer to as target parameters — are, however, 
not the parameters explicitly appearing in the signal model, but are derived 
from them. The target parameters capture the physical essence of each ob- 
ject under study. Our goal is to find accurate estimates of these parameters 
given observed data and theoretic models of the basic components. See (3.1) 
for the formal problem statement. 

Our proposed methods choose small sets of prototypes from a large dic- 
tionary of physical components to fit the signal model to the observed data 
from each object of interest. Even though the data are truly generated as 
combinations of curves from a continuous (or fine) grid of parameters, we 
obtain more accurate maximum likelihood estimates of the target parame- 
ters by using a smaller, principled choice of prototype basis. This result is 
partially due to the fact that maximum likelihood estimation (MLE) often 
fails when the parameters take values in an infinite-dimensional space. In Ge- 
man and Hwang (1982), the authors suggest salvaging MLE for continuous 
parameter spaces by a method of sieves [Grenander (1981)], where one max- 
imizes over a constrained subspace of the parameter space and then relaxes 
the constraint as the sample size grows. Quantization is one such method 
for constraining the parameter space, and the optimal number of quanta or 
prototypes is then determined by the sample size; see Meinicke and Ritter 
(2002) for an example of quantized density estimation with MLE. Our ap- 
proach is based on similar ideas but our final goal is parameter estimation 
rather than density estimation. Although we do not directly tie the number 
of quanta to the sample size, we do observe a similar phenomenon: In the 
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face of limited, noisy data, gains can be made by reducing the parameter 
space further prior to finding the MLE. By deriving a small set of prototypes 
that effectively cover the support of the signal model, we obtain a marked 
decrease in the variance of the final parameter estimates, and only a slight 
increase in bias. Furthermore, by choosing a smaller set of prototypes, the 
fitting procedure becomes computationally tractable. 

Our principal motivation for developing this methodology is to under- 
stand the process of star formation in galaxies. Specifically, researchers in 
this field seek to improve the physical models of galaxy evolution so that 
they more accurately explain the observed patterns of galaxy star forma- 
tion history (SFH) in the Universe. The principal idea is that each galaxy 
consists of a mixture of subpopulations of stars with different ages and com- 
positions. By estimating the proportion of each constituent stellar subpopu- 
lation present, we can reconstruct the star formation rate and composition as 
a function of time, throughout the life of that galaxy. This is the approach of 
galaxy population synthesis [Bica (1988), Pelat (1997), Cid Fernandes et al. 
(2001)], whereby the observed data from each galaxy are modeled as linear 
combinations of a set of idealized simple stellar populations (SSPs, groups of 
stars having the same age and composition) plus some parametrized, nonlin- 
ear distortions. Equation (2.1) shows one such galaxy population synthesis 
model. The fitted parameters from this signal model allow us to estimate 
the SFH target parameters of each galaxy, which are simple functions of 
the parameters in this model. Astrophysicists can use the estimated SFHs 
of a large sample of galaxies to better understand the physics governing 
the evolution of galaxies and to constrain cosmological models. This model- 
ing approach has produced compelling estimates of cosmological parameters 
such as the cosmic star formation rate, the evolution of stellar mass density, 
and the stellar initial mass function, which describes the initial distribution 
of stellar masses in a population of stars [see Asari et al. (2007) and Panter 
et al. (2007) for examples of such results]. 

SFH target parameter estimates from galaxy population synthesis are 
highly dependent on the choice of SSP basis. Astronomers have the abil- 
ity to theoretically model simple stellar populations from fine parameter 
grids, but much care needs to be taken to determine an appropriate basis 
to achieve accurate SFH parameter estimates. In Richards et al. (2009a) it 
was shown that better parameter estimates are achieved by exploiting the 
underlying geometry of the SSP disribution than by using SSPs from regu- 
lar parameter grids. In this paper we will further explore this problem. Our 
main contributions are the following: 

(1) to introduce prototyping as an approach to estimating parameters 
derived from the signal model parameters and to show the effectiveness of 
quantizing the vector space or support of the model data. 
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(2) to demonstrate that sparse coding does not work as a prototyping 
method without the appropriate constraints and that constrained sparse 
coding methods do not perform well for target parameter estimation, and 

(3) to work out the details of the star formation history estimation prob- 
lem and obtain more accurate estimates of SFH for galaxies than the ap- 
proaches used in the astronomy and statistics literature. 

There are several other fields where observed data are commonly modeled 
as linear combinations of dictionaries of theoretical or idealized components 
(plus some parametrized distortions), for example: remote sensing, both of 
the Earth [Roberts et al. (1998)] and other planets [Adams, Smith and John- 
son (1986)], where the observed spectrum of each area of land is modeled 
as a mixture of pure spectral "endmembers;" computer vision and compu- 
tational anatomy [Allassonniere, Amit and Trouve (2007), Sabuncu, Balci 
and Golland (2008)], where data are modeled as mixtures of deformable tem- 
plates; and compositional modeling of asteroids [Clark et al. (2004), Hapke 
and Wells (1981)], where observed asteroids are described as mixtures of 
pure minerals to determine their composition. These applications can bene- 
fit from the methodology proposed here. A related and important problem 
in theoretical physics is gravitational wave modeling [Babak et al. (2006), 
Owen and Sathyaprakash (1999)], where large template banks are used to 
estimate the parameters of observed compact binary systems (such as neu- 
tron stars and black holes). In this particular problem, one is interpolating 
between runs of the computer model, and not modeling the observed data 
as superpositions of the model output, as we do in this paper. 

There are strong connections between this work and ongoing research into 
the design of computer experiments; see Santner, Williams and Notz (2003) 
and Levy and Steinberg (2010) for an overview of the topic. The fundamen- 
tal challenge in that setting is to adequately characterize the relationship 
between input parameters to a simulation model and the output that the 
model produces. The term "simulation model" should be interpreted broadly 
to mean computer code which produces output as a function of input param- 
eters; in situations of interest, this code is a computationally-intensive model 
for a complex physical phenomenom. Hence, one must carefully "design the 
computer experiment" by choosing the set of input parameter vectors for 
which runs of the simulator will be made. Regression methods are then used 
to approximate the output of the simulator for other values of the input 
parameters. As is the case in our application, the ultimate objective is to 
compare observed data with the simulated output to constrain these input 
parameters. Research has largely focused on situations in which the output 
of interest is scalar, but there has been recent work on functional outputs; 
see, for instance, Bayarri et al. (2007). Here, we have the same goal of pa- 
rameter estimation, but instead of seeking to reduce the number of times the 
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Fig. 1. Database of Gaussian curves used in the example in Section 1.1. Simulated data 
are generated as noisy random sparse linear combinations of these curves. As a increases, 
it becomes more difficult to distinguish the curves, especially in the presence of noise. 
A basis of prototypes for estimation of the target parameter, a, should include a higher 
proportion of low-o Gaussian curves. 

computer code must be run, we instead work with the scientific details of 
the problem at hand and simplify the code in a principled manner to reduce 
the computational burden. 

1.1. Introductory example. To elucidate the challenges of this type of 
modeling problem, we begin with a simple example. Imagine our dictionary 
consists of /i = Gaussian functions generated over a fine grid of o", such 
as those in Figure 1. We observe a set of objects, each producing data from 
a different function constructed as a sparse linear combination of the dictio- 
nary of Gaussian functions. The data from each object are sampled across 
a fixed grid with additive i.i.d. Gaussian noise. The component weights are 
constrained to be nonnegative and sum to 1, ensuring that all parameters 
are physically-plausible (e.g., cj > 0). 

Our ultimate goal is to estimate a set of target parameters for each ob- 
served data point. In this example, our target is cj, the weighted average a 
of the component Gaussian curves of each observed data vector. To this end, 
we model each observed curve as a linear superposition of a set of prototypes 
and use the estimated prototype weights to estimate a. 

If our goal were to reconstruct each data point with as small of error as 
possible, then a prototyping approach that samples along the boundary of 
the convex hull of the dictionary of Gaussian functions (such as archetypal 
analysis, see Section 4.2.1) would be optimal. In this paper, the goal is to 
achieve small errors in the target parameter estimates. A common approach 
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for this problem is to sample prototypes uniformly over the parameter space. 
However, this often leads to the inclusion of many prototypes with nearly 
identical curves. Consider the Gaussian curve example: for high values of cr, 
the curves do not change considerably with respect to changes in a. Under 
the presence of noise, curves with large a are not distinguishable. We are 
better off including a higher proportion of prototypes in the low-cr range, 
where curves change more with respect to changes in a. 

This intuition leads us to a different approach: choose prototypes by quan- 
tizing the space of curves (see Section 4.1). We show in Section 5.1 that 
a method that selects prototypes by quantizing the vector space of theo- 
retical components outperforms the method of choosing prototypes from 
a uniform grid of a in the estimation of a (see Figure 5). Additionally, judi- 
cious selection of a reduced prototype basis is an effective regularization of 
an estimation problem that is subject to large variance when the full range 
of theoretical components are utilized without any smoothing. The simu- 
lation results shown below will display markedly reduced variances in the 
estimates of the parameters of interest relative to the same procedures using 
larger libraries of basis functions. 

Additionally, smaller prototype bases yield better parameter estimates 
than the approach of using all of the theoretical components to model ob- 
served data, a phenomenon that can be explained by the markedly reduced 
variance of parameter estimates found by smaller, judiciously-chosen bases. 

1.2. Paper organization. The paper is organized as follows. In Section 2 
we detail the problem of estimating star formation history parameters for 
galaxies and explain how prototyping methods can be used to obtain accu- 
rate parameter estimates. In Section 3 we formalize the problem of prototype 
selection for target parameter estimation and in Section 4 describe several 
approaches. We apply those methods to simulated data in Section 5 to com- 
pare their performances. In Section 6 we return to the astrophysics example, 
applying our methods to galaxy data from the Sloan Digital Sky Survey. We 
end with some concluding remarks in Section 7. 

2. Modeling galaxy star formation history. Galaxies are gravitationally- 
bound objects containing 10^-10^*^ stars, gas, dust and dark matter. The 
characteristics of the light we detect from each galaxy primarily depend on 
the physical parameters (e.g., age and composition) of its component stars 
as well as distortions due to dust that resides in our line of sight to that 
galaxy, spectral distortions due to the line-of-sight component of the orbital 
velocities of its component stars, and the distance to the galaxy. 

The physical mechanisms that govern galaxy formation and evolution are 
complicated and poorly understood. Galaxies are complex, dynamic objects. 
The star formation rate (SFR) of each galaxy tends to change considerably 
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throughout its lifetime and the patterns of SFR vary greatly between dif- 
ferent galaxies. The SFR for each galaxy depends on a countless number 
of factors, such as merger history, the galaxy's local environment (e.g., the 
matter density of its neighborhood, and the properties of surrounding galax- 
ies) and chemical composition. Astronomers are interested in refining galaxy 
evolution models so that they match the observed patterns of galaxy SFH 
in the Universe. It is imperative that we first have accurate estimates of 
the star formation history parameters for each observed galaxy. These SFH 
estimates are necessary to test competing physical models, alert to possible 
shortcomings in current models, and estimate cosmological parameters [for 
an example of such an analysis, see Asari et al. (2007)]. 

2.1. Population synthesis model. A common technique in the astronomy 
literature, called empirical population synthesis, is to model each galaxy as 
a mixture of stars from different simple stellar populations (SSPs), defined as 
groups of stars with the same age and metallicity {Z, defined as the fraction 
of mass contributed by any element heavier than helium). The principle 
behind this method is that each galaxy consists of multiple subpopulations 
of stars of different age and composition so that the integrated observed 
light from each galaxy is a mixture of the light contributed by each SSP. 
Describing the data from each galaxy as a combination of SSPs allows us 
to reconstruct the star formation and metallicity history of each galaxy. 
This is because, for each galaxy, the component weight on an SSP captures 
the proportion of that galaxy's stars that was created at the specific epoch 
corresponding to the age of that SSP. Therefore, the full vector of SSP 
component weights for each galaxy describes the star formation throughout 
the galaxy's lifetime. 

Theoretical SSPs can be produced by physical models, that are in turn 
constrained by observational studies. These models typically start with a set 
of initial conditions and evolve the system forward in time based on sets of 
physically motivated differential equations. The output produced by these 
models can be extremely detailed. In our study, we use a set of high- 
resolution, broad-band spectra from the SSP models of Bruzual and Chariot 
(2003). See Figure 2 for an example of some SSP spectra, plotted over the 
optical portion of the electromagnetic spectrum. 

The galaxy data we use to estimate SFH parameters are high-resolution, 
broad-band spectra from the Sloan Digital Sky Survey [SDSS, York et al. 
(2000)] which consist of light flux measurements over thousands of wave- 
length bins. To model the data from each galaxy, we adopt the empirical 
population synthesis generative model of a galaxy spectrum introduced in 
Cid Fernandes et al. (2004): 

(2.1) Yx{-f,Mx„Av,v,,a,) = Mxo(^^-/j^j,xrx{Av)j ^G{v,,a,), 
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Fig. 2. Two bases of SSP spectra of size K — 45, colored by logt. Each spectrum is 
normalized to 1 at Ao = 4020 A. Top: basis of regular (t,Z) grid used in Cid Fernandes et 
al. (2005). Bottom: diffusion K -means basis used in Richards et al. (2009a). The diffusion 
K -means basis shows a more gradual sampling of spectral space than the regular grid basis, 
which over-samples spectra from young stellar populations. 

where Y;^ is the hght flux at wavelength A. The components of model (2.1) 
are the following: 

• Xj is the jth SSP spectrum normalized at wavelength Aq- Each SSP has 
age t(Xj) and metallicity Z(Xj). In the true generative model, X contains 
an infinite number of SSP spectra over the continuous parameters of age 
and metallicity. 

• 7j € [0, 1] , the component proportion of the jth SSP. The vector 7 is the 
population vector of the galaxy, the principal parameter of interest for 
calculating derived parameters describing the SFH of a galaxy. 

• , the observed flux at wavelength Aq . 

• r\{Av) accounts for the wavelength-dependent fraction of light that is 
either absorbed or scattered out of the line of sight by foreground dust. 
Ay parametrizes the amount of this dust extinction that occurs. We adopt 
the reddening model of Cardelli, Clayton and Mathis (1989). 

• Convolution, in wavelength, by the Gaussian kernel G{v^,,a^) describes 
spectral distortions from Doppler shifts caused by the movement of stars 
within the observed galaxy with respect to our line-of-sight, and is parame- 
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trized by a central velocity and dispersion a^,. Previous to the analysis, 
care was taken to properly resample all spectra — both the observed and 
model spectra — to 1 measurement per Angstrom.^ This was done to ensure 
the reliability of the spectral errors when used by the STARLIGHT spectral 
fitting software. More details are available at http : //www . starlight . 
uf sc . br/papers/Manual_StCv04 . pdf . 

2.2. SSP basis selection and SFH parameter estimation. For each galaxy, 
we observe a flux, Oa, at each spectral wavelength. A, with corresponding 
standard error, a\, estimated from photon counting statistics and charac- 
teristics of the telescope and detector. To estimate the target SFH param- 
eters for each galaxy, we use the STARLIGHT^ software of Cid Fernandes 
et al. (2005), fitting model (2.1) using maximum likelihood. The code uses 
a Metropolis algorithm with simulated annealing to minimize 

(2.2) x^h,Mx„Av,v„a,) = Y^ 

A=l 

where is the model flux in (2.1). The optimization routine searches for 
the maximum likelihood solution for the model 0\ ^ N(Yx,^x), i.i.d. for 
each A. The minimization of (2.2) is performed over A'^ + 4 parameters: 
7i , . . . , 7Ar , Mxf^ , Ay , , and fi* . The speed of the algorithm scales as O(iV^) , 
so it is imperative to pick a SSP basis with a small number of spectra. 

In practice, we use a basis of A' ^ prototype SSP spectra, ^ = {^i, . . . , 
^ k} — which can be a carefully chosen subset or a nontrivial combination 
of the Xj's — and model each galaxy spectrum as 

(2.3) Yx{fi,Mx,,Av,v,,cj^) = Mx,{^J3k^k^rx{Av)^ ®G(^;*,a*), 

where each prototype, has age t(^'fc) and metallicity Z(^'fc), and 

Our goal in this analysis is to choose a suitable SSP basis to estimate 
a set of physical parameters for each galaxy. Some of the commonly-used 
SFH parameters are as follows: 

• (log t)L = ^^^i 7i log t(Xj) , the luminosity- weighted average log age of the 
stars in the galaxy, 

• \og{Z)L = log^j^-^ 7jZ(Xj), the log luminosity- weighted average metal- 
licity of the stars in the galaxy, 

• 7c, a time- binned version of the population vector, 7, and 




*Note that the model SSP spectra are computed over a broader wavelength range than 
the observed spectra to provide an essential wavelength cushion for the convolution. 
^STARLIGHT can be downloaded at http://www.starlight.ufsc.br/. 
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• (logt)j\/,log(Z)jvf, mass- weighted versions of the average age and metal- 
hcity of the stars in the galaxy. 

We estimate each of these parameters using the maximum hkehhood param- 
eters from model (2.3). In Richards et al. (2009a), we introduced a method 
of choosing a SSP prototype basis and compared it to bases of regular (t, Z) 
grids that were used in previous analyses. See Figure 2 for a plot of two such 
SSP spectral bases. 

3. Formal problem statement. We begin with a large, fixed set of A'" the- 
oretical components, each with known parameters tTj (these are the physical 
properties of each component). We refer to this set as the model data. These 
data can be thought of as a sample from some distribution Px in W. The 
model data are stored in an p by matrix X = [Xi, . . . , X^v], where p is 
the total wavelength range of the SSP spectra. We assume that each ob- 
served data point Yj, j = 1,. . . , M, is generated from the linearly separable 
nonlinear model 

(3.1) Yj = f^^,,Xf,e,^+ej, 

where, for each j, the coefficients, 71^, . . . , 7jvj, are nonnegative and sum 
to 1. The functional / is a known, problem-dependent (possibly nonlinear) 
function of the linear combination of the components X and some unknown 
parameters, 6j . Each ej is a vector of random errors. The set of target param- 
eters for each observed data vector, Yj, is {pj,9i}, where pj = '}2d=ilij'^i 
is a function of the model weights, 7, and intrinsic parameters, vr, of the 
theoretical components. 

For large it is impossible to use model (3.1) to estimate each {pj,6j} 
due to the large computational cost. Our goal is to find a set of prototypes 
^ = ['J'l, . . . , ^ k], where K <^ N, that can accurately estimate the target 
parameters {pj,6j} for each observed Yj, using the model 

(3.2) Y,=/('^/3fcj*, 

\k=l 

where (3ij, . . . , (3k j are nonnegative component weights such that f3kj = 1 
for all j. Naturally, our estimate of pj is 

K N 

(3.3) Pj = ^ ^ Oik-Ki, 

k=l i=l 

where the /3jfe are estimated using the model (3.2), and a is an by K 
matrix of nonnegative coefficients that defines the prototypes from the dic- 
tionary of components by 

(3.4) * = Xq. 
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The coefficients a are constrained such that each of the prototypes, re- 
sides in a region of the theoretical component space, Rk € X, with nonzero 
probabihty, Px{Rk) > 0, over ah plausible values of the physical parameters 
used to generate X. This constraint is enforced to ensure the physical plau- 
sibility of the prototypes, ^ , and their parameters. If our prototype basis 
were to include components that are disallowed by the physical models that 
generated X, then the parameter estimates for the observed data would be 
uninterpretable. 

4. Methods for prototyping. The usual method used to choose a basis for 
estimating target parameters from the signal model is to select prototypes 
from a regular grid in the physical parameter space. Examples of such bases 
are those found in Cid Fernandes et al. (2005) and Asari et al. (2007), both of 
whom employ SSPs on regular grids of age and metallicity to estimate SFH 
parameters. In this section we propose methods that use the set of physical 
components, X, to construct a prototype basis in a principled manner. In 
Section 5 we compare the proposed basis selection methods via simulations, 
and show that regular parameter grids tend to yield suboptimal parameter 
estimates. 

4.1. Quantization of model space. For problems of interest, practical fitting 
of theoretical models to noisy data requires a finite set of prototypes. The 
question becomes how to best choose this set of prototypes, that is, how to 
quantize the model space. Here, instead of quantizing the parameter space by 
choosing uniform parameter grids, we propose methods that quantize the 
vector space X of theoretical model-produced data. The idea behind this ap- 
proach is that under the presence of noise, components with similar functio- 
nal forms will be indistinguishable, so that it is better to choose prototypes 
that are approximately evenly spaced in X (rather than evenly spaced in the 
parameter space). By replacing the theoretical models in each neighborhood 
by their local average, the model quantization approach is optimal for treat- 
ing degeneracies because it allows a slight increase in bias to achieve a large 
decrease in variance of the target parameter estimates. The increase in esti- 
mator bias should be small because more prototypes are included in param- 
eter regions where we can better discern the theoretical data curves of the 
components, allowing for precise parameter estimates in those regions and 
coarser average estimates in degenerate regions. If, instead, multiple com- 
ponents in our dictionary were to have very similar theoretical data curves 
but different parameter values, then, in the absence of any other method 
of regularization, we would have difficulty breaking the degeneracy no mat- 
ter how many prototypes we include in that region of the parameter space, 
causing increased parameter estimator variance and higher statistical risk. 

4.1.1. K-means and diffusion K-means. The basic idea here is to quan- 
tize the vector space or support of model-produced data with respect to an 
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appropriate metric and prior distribution. The vector quantization approach 
can be formahzed as follows: 

Suppose that Xi , . . . , X^v is a sample from some distribution Px with 
support X cW. The support X often has some lower-dimensional structure, 
which we refer to as the lower-dimensional geometry of X. Fix an integer 
K < N. To any dictionary A = {ai, . . . jSlk} of prototypes, we can assign 
a cost 

(4.1) W{A,Px)= [ mm\\x-afPx{dx). 

Let Bk denote all sets of the form B = {bi, . . . ,b/^} with hj € M^. Define 
the optimal dictionary of K prototypes as the cluster centers 

^ = argminVF(i?, Px)- 

BeBk 

In practice, we estimate * from model-produced data Xi, . . . , X^v according 
to 

* = arg min W{B,Px), 

B 

where Px is the empirical distribution. This estimate is found by Lloyd's 
i^-means (KM) algorithm. To simplify the notation, we will henceforth skip 
the hat symbol on all estimates. 

The empirical i^-means solution corresponds to allocating each Xj into 
subsets Si, . . . ,Sk, where the K centroids define the prototypes. In the 
definition of the prototypes in (3.4), this reduces to 



(4.2) a^k 



1 



0, else. 



if i G Sk, 



Potential problems to this approach are the following: (1) the KM prototypes 
will adhere to the design density on X, and (2) for small K, estimated 
prototypes could fall in areas that Px assigns probability zero. The first 
issue can be corrected using a weighted i^T-means approach or a method such 
as uniform subset selection (Section 4.1.2). However, often the density on X 
corresponds to a prior distribution on the physical parameters, meaning it 
is often desirable to adhere to its design density. To remedy the latter issue, 
we could select as prototypes the K data points that are closest to each 
of the centroids. We see in simulations that this approach tends to yield 
slightly worse parameter estimates than the original K-means formulation. 
We attribute this to the smoother sampling of parameter space achieved by 
the original KM formulation, which averages the parameters of components 
with similar theoretical data, effectively decreasing the variability of the 
parameter estimates. 
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If the theoretical data are high dimensional, we might choose to first learn 
the low-dimensional structure of X and then employ K-means in this re- 
duced space. This would permit us to avoid quantizing high-dimensional 
data, where ET-means can be problematic due to the curse of dimensional- 
ity. This failure occurs because the theoretical data are extremely sparse in 
high dimensions, causing the distances between similar components to ap- 
proach the distances between unrelated objects. To remedy this, we suggest 
the use of the diffusion map method for nonlinear dimensionality reduc- 
tion [Coifman and Lafon (2006), Lafon and Lee (2006)]. In other words, we 
transform the model data into a lower-dimensional representation where we 
apply X-means (diffusion ET-means, DKM). Formally, this corresponds to 
substituting (4.1) with the cost function 



where (/> is a data transformation defined by diffusion maps. 

4.1.2. Uniform subset selection. In the theoretical model data quantiza- 
tion approach the goal is to have prototypes regularly spaced in X, where X 
is the support of Px- With this heuristic in mind, we devise the uniform 
subset selection (USS) method, which sequentially chooses the component 
Xj € X that is furthest away from the closest component that has already 
been chosen. Because the choice of distance metric is flexible, USS can be 
tailored to deal with many data types and high-dimensional data. Unlike 
iiT-means, USS is not influenced by differences in the density of components 
across X. However, USS typically chooses extreme components as proto- 
types because in each successive selection it picks the furthest theoretical 
data curve from the active set. In simulations, USS produces poor parameter 
estimates due to its tendency to select extreme components. 

4.2. Sparse coding approaches. Most standard sparse coding techniques 
do not apply for the prototyping problem. Without the appropriate con- 
straints, the prototype basis elements will be nonphysical and the subsequent 
parameter estimates will be nonsensical (see Section 4.2.3). There are meth- 
ods related to sparse coding that enforce the proper constraints to ensure 
that prototype basis elements reside within the native data space (see Sec- 
tions 4.2.1 and 4.2.2), but these generally do not perform well for target pa- 
rameter estimation because their objective of optimal data reconstruction — 
and not estimation of the target parameters — forces these methods to choose 
extreme prototypes. 



Software for diffusion maps and diffusion X-means is available in the dif fusionMap R 
package, which can be downloaded from http://cran.r-project.org/web/packages/ 
dif f usionMap/index . html . 



(4.3) 
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4.2.1. Archetypal analysis. Archetypal analysis (AA) was introduced by 
Cutler and Breiman (1994) as a method of representing each data point as 
a linear mixture of archetypal examples, which themselves are linear mix- 
tures of the original component dictionary. The method searches for the set 
of archetypes ^i, . . . , that satisfy (3.4) and minimize the residual sum 
of squares (RSS) 



(4.4) 



(4.5) 



RSS: 
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where X^^Li Pik = 1 for all i and > for all i and k. To minimize the RSS 
criterion, an alternating nonnegative least squares algorithm is employed, al- 
ternating between finding the best /3's for a set of prototypes and finding 
the best prototypes (ct's) for a set of /3's. This computation scales linearly 
in the number of dimensions of the original theoretical data, with compu- 
tational complexity becoming prohibitive for dimensionality more than 500 
[Stone (2002)]. 

Once there are as many prototypes, K, as the number of data points that 
define the boundary of the convex hull, any element in the dictionary can be 
fit perfectly with a linear mixture of the prototypes, yielding a RSS of 0. If 
we try to pick more prototypes than the number of data points that define 
the boundary of the convex hull, then the AA algorithm will fail to converge 
because /3 becomes noninvertible, preventing the iterative algorithm to find 
the optimal set of prototypes, ^ = /3~^X, given the current /3. We have 
experimented with using the Moore-Penrose pseudoinverse to perform this 
operation, but it is usually ill-behaved when /3 is noninvertible. This upper 
bound on the number of AA prototypes is a serious drawback to using AA 
as a prototyping method because often the complicated nature of the data 
generating processes necessitates the use of larger prototype bases. 

Prototypes found by AA are optimal in the sense that they minimize 
the RSS for fitting noiseless, linear mixtures of the X's. This is the case 
because AA prototypes are found along the boundary of the convex hull 
formed by the X's [see Cutler and Breiman (1994)]. Unlike AA, our ob- 
jective is not to minimize RSS, but to minimize the error in the derived 
parameter estimates. Archetypal analysis achieves suboptimal results in the 
estimation of p because it only samples prototypes from the boundary of the 
component space, X, focusing attention on extreme cases while disregarding 
large regions of X. In Section 5 we show using simulated data that AA is 
outperformed by the model quantization approach for estimating the target 
parameters from the signal model parameters. 
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4.2.2. Sparse subset selection. We introduce the method of sparse sub- 
set selection (SSS), whose goal is to find a subset of the original dictionary, 

C X, that can reconstruct X in a linear mixture setting. This method is 
motivated by sparse coding in that it seeks the basis that minimizes a reg- 
ularized reconstruction of X, where the regularization is chosen to select 
a subset of the columns of X. 

Recently, Obozinski et al. (2011) introduced a method of variable selection 
in a high-dimensional multivariate linear regression setting. Their method 
uses a penalty on the ii/iq norm, for q > 1, of the matrix of regression 
coefficients in such a way that induces sparsity in the rows of the coefficient 
matrix. We can, in a straightforward way, adapt their method to select 
a subset of columns of X to be used as prototypes. Our objective function is 



where || • ||_f is the Frobenius norm of a matrix, and the £i/iq penalty is 
defined as 



so that sparsity is induced in the rows of B, the N hy N matrix of nonneg- 
ative mixture coefficients. Additionally, B is normalized to sum to 1 across 
columns. The basis, is defined as the columns of X that correspond to 
nonzero rows of B (a is the corresponding indicator variable). The param- 
eter Afc controls the number of prototypes in our SSS set 

To perform the optimization (4.6), we use the CVX Mat lab package [Grant 
and Boyd (2010)]. Setting q = 2, we recast the problem as a second-order 
cone problem with the additional constraints of nonnegativity and column 
normalization of B [see Boyd and Vandenberghe (2004)]. The current im- 
plementation cannot solve problems for large N. In Section 4.3 we show, for 
a small problem, that SSS has behavior similar to archetypal analysis in that 
it selects prototypes from the boundary of the convex hull of X. Like AA, 
SSS is not a good method for target parameter estimation. 

4.2.3. Some methods not useful for prototyping. There are other meth- 
ods for sparse data representation that fail to work for prototype selection. 
These methods are not applicable to this problem because they do not se- 
lect prototypes that reside in regions of X with nonzero probability Px. 
The failure to obey this constraint means that the chosen prototypes in 
general will not be physical, meaning that either their theoretical data or 
intrinsic parameters are disallowed. For instance, in the SFH problem, this 
could lead us to use prototypes whose spectra have negative photon fluxes or 



(4.6) 




(4.7) 
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whose ages are either negative or greater than the age of the Universe. Using 
such uninterpretable prototypes to model observed data produces parameter 
estimates that are nonsensical. 

We mention two popular methods for estimating small bases from large 
dictionaries, X, and describe why they are not useful for prototyping: 

In standard sparse coding [Olshausen et al. (1996)], the goal is to find 
a decomposition of the matrix X, in which the hidden components are sparse. 
Sparse coding combines the goal of small reconstruction error along with 
sparseness, via minimization of 



where the trade-off between ii sparsity in the mixture coefficients A, and 
accurate reconstruction of X, is controlled by A. However, there are no 
constraints on the sign of the entries of A or meaning that prototypes 
with nonphysical attributes are allowed. 

Nonnegative Matrix Factorization (NMF) [Lee and Seung (2001), Paatero 
and Tapper (1994)] is a related technique that includes strict nonnegativity 
constraints on all coefficients aij and while minimizing the reconstruc- 
tion of X, 



This construction is different than our prototype definition in (3.4), where 
^ = Xa. To reconcile the two, we see that, since N > K, a is the right 
inverse of A: 



which exists if A is full rank. However, under this formulation, the aij are 
not constrained to be nonnegative and the resultant prototypes are not con- 
strained to reside in X. Thus, NMF is not useful for prototyping. Note that 
archetypal analysis avoids this problem by enforcing the further constraint 
that the prototypes be constrained linear combinations of X. 

4.3. Comparison of prototypes. We apply four prototyping methods to 
the two-dimensional data set toy in the archetypes R package.^ We treat 
each 2-D data point, Xj, as model-produced theoretical data. Plots of this 
dictionary of data and the selected prototypes for four different prototyping 
methods, using K = 7, are in Figure 3. if-means places prototypes evenly 
spaced within the convex hull of the data. USS also evenly allocates the pro- 
totypes, but places many along the boundary of the native space. Archetypal 





(4.9) 




(4.10) 




^Available from CRAN at http://craii.r-project.org/web/packages/archetypes. 
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Fig. 3. Distribution of prototypes (red k's) for four different methods when applied to 
the 250 theoretical data objects in the toy data set (grey •'s). K-means evenly samples 
the native data space while the other methods focus more attention to the boundary of the 
space. 



analysis and SSS place all prototypes on the boundary of the convex hull. 
Note that for more than 7 prototypes, the archetypal analysis algorithm 
does not converge to a solution. 

5. Simulated examples. In this section we test the effectiveness of the 
prototyping methods for estimating a set of target parameters using sim- 
ulated data. The first test set is the toy example of zero-mean Gaussian 
curves discussed in Section 1.1. The second simulation experiment is a set 
of realistic galaxy spectra created to mimic the SDSS data that we later 
analyze in Section 6. 

5.1. Gaussian curves. We begin with the example introduced in Sec- 
tion 1.1. We simulate a database of = 157, /U = Gaussian curves, Xi, . . . , 
Xjv, on a fine grid of a = (cxi, . . . , ajsf) from 0.2 to 8 in steps of 0.05 (see Fig- 
ure 1). Each Xj is represented as a vector of length 321. From this database, 
we simulate a set of 100 data vectors, Yi, . . . , Yioo, from the model 

N 

(5.1) Y, = ^7,,X, + e„ 

i=l 
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Parameters for Gaussian curve prototypes 
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Fig. 4. Distribution of K = 15 prototype a values for seven different prototyping meth- 
ods applied to the Gaussian curves example. The methods are the following: Grid-regular a 
grid, log Grid-regular log(a") grid, KM — K -means, DKM — diffusion K -means, USS — uni- 
form subset selection, A A — archetypal analysis, and SSS — sparse subset selection. 

where the mixture coefficients, > 0, sum to unity for each j and have at 
most 5 nonzero entries for each j. The noise vectors, £j, are i.i.d. normal 
zero-mean with standard deviation 0.05. 

From Xi,...,X7v, we generate bases of prototypes using six different 
methods described in Section 4. To explore the differences in each of these 
methods, we plot (Figure 4) the distribution oi K = 15 prototype a values. 
The model quantization methods (KM, DKM, USS) find more prototypes 
with small a values. The AA and SSS methods place more prototypes at the 
extreme values of a (note that for SSS, we ran the algorithm on a coarser 
grid of 32 Gaussian curves) . 

To evaluate each of the methods, we compare their ability to estimate the 
average a for each Yj, defined as 



For each choice of basis, we fit the observed data using nonnegative least 
squares.^ In Figure 5 the MSE for a estimation for ii'-means, diffusion K- 
means, USS and uniform cj-grid and log{a) grid bases is plotted as a function 
of K. SSS is not plotted because it yields parameter estimates with MSE 
> 2. AA is not plotted because it only converges for K < 15, and performs 
worse than the a grid for those values. KM and DKM outperform the regular 
parameter grids, USS, and AA prototype bases. KM achieves a minimum 
MSE, averaged over 25 trials, of 0.815 at = 10 prototypes. DKM achieves 
a minimum MSE of 0.846 at K = 15 prototypes, while the uniform a grid 
achieves a minimum MSE of 1.378, 1.7 times higher than the best MSE for 



We use the nnls R package, which uses the Lawson-Hanson nonnegative least squares 
implementation [Lawson and Hanson (1995)]. 



N 





i=l 
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Fig. 5. MSE for the estimation of a for the Gaussian curve example. Plotted is the 
MSE for using a regular parameter grid, K -means (KM), diffusion K -means (DKM) and 
archetypal analysis (AA) prototype bases. Both DKM and KM achieve significantly better a 
estimates than a regular parameter grid and outperform estimates obtained by using all 
157 Gaussian curves in the original dictionary. For each K , the MSE is averaged across 
25 repetitions of the experiment. Point-wise 68% confidence bands are shown as dotted 
lines. 

KM. Results for AA and SSS are not plotted because AA only converges 
for K <lb prototypes, and SSS is too computationally intensive to run 
on the entire dictionary of curves; at K = 15, neither method outperforms 
a uniform a grid. 

An interesting observation in Figure 5 is that the minimum MSE for 
estimating a is achieved for = 10 KM prototypes. As the number of pro- 
totypes increases from 10, the KM a estimates worsen. This exemplifies the 
bias- variance trade-off in the estimation procedure: for K > 10, the increased 
variance of the estimates is larger than the reduction in squared-bias. Esti- 
mates of a from four of the five prototype bases plotted in Figure 5 outper- 
form the estimates found by fitting each Yj as a mixture of all 157 original 
component curves. Over the 25 repetitions of the simulations, the 7jj which 
are positive, that is, the Xj that receive any weight, vary widely. These 
results demonstrate that a single, judiciously chosen, reduced basis can re- 
produce a wide range of truths and return accurate parameter estimates 
with reduced variance. 

5.2. Simulated galaxy spectra. We further test the performance of each 
prototyping method using realistic simulated galaxy spectra. Starting with 
a database, X, of 1,182 SSPs from the models of Bruzual and Chariot (2003) 
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(see Section 2), we generate simulated galaxy spectra using the model (2.1). 
The SSPs are generated from 6 different metallicities and a fine sampling 
of 197 ages from to 14 Gyrs. We use a prescription similar to Chen et al. 
(2009) to choose the physical parameters of the simulations, altered to have 
higher contribution from younger SSPs. The basic physical components of 
the simulation are as follows: 

(1) A star formation history with exponentially decaying star formation 
rate (SFR): SFR oc exp{'yt). Here, 7 > 0, so the SFR is exponentially declin- 
ing with time, as t is the age of the SSP today. 

(2) We allow 7 to vary between galaxies. For each galaxy we draw 7 from 
a uniform distribution between 0.25 and 1 Gyr^^. 

(3) The time fform when a galaxy begins star formation is distributed 
uniformly between and 5.7 Gyr after the Big Bang, where the Universe is 
assumed to be 13.7 Gyr old. 

(4) We allow for starbursts, epochs of increased SFR, with equal proba- 
bility at all times. The probability a starburst begins at time t is constructed 
so that the probability of no starbursts in the life of the galaxy is 33%. The 
length of each burst is distributed uniformly between 0.03 and 0.3 Gyr and 
the fraction of total stellar mass formed in the burst in the past 0.5 Gyr is 
distributed log-uniformly between and 0.5. The SFR of each starburst is 
constant throughout the length of the burst. 

Each galaxy spectrum is generated as a mixture of SSPs of up to 197 time 
bins, with a uniformly drawn metallicity in each bin. We draw the reddening 
parameter (Ay) and velocity dispersion (ctq) from empirical distributions 
over a plausible range of each parameter. We simulate 100 galaxy spectra 
with i.i.d. zero-mean Gaussian noise with S/N = 10 at Aq = 4020 A. 

We apply the methods in Section 4 to choose SSP prototype bases from X. 
In Figure 6 the distributions of the SSP prototype ages and metallicities 
for K = 150 prototype bases are plotted along with the regular parameter 
grid used by Asari et al. (2007). Each method highly samples the older, 
higher metallicity SSPs and typically only includes a few prototypes with 
low age and low metallicity. This is reasonable because older, higher metal- 
lic SSP spectra change more with respect to changes in age and metallicity. 
Any method for prototyping based on the model-produced data will de- 
tect this difference and sample these regions of the parameter space more 
highly. 

Each simulated galaxy spectrum is fit using the STARLIGHT software with 
each prototype basis. To assess the performance of each method, we com- 
pare the accuracy of their parameter estimates. In Figure 7 we plot the 
MSE of the estimates of log(f*)L, {logZ^)L,Av and cr* and the average er- 
ror of the coarse-grained population vector estimate, 7c, measured by the 
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Fig. 6. Distribution of{t,Z) of several prototype bases of SSPs, K — 150. All bases were 
derived using a database of 1,182 model-produced SSPs. Each of the methods more heavily 
samples prototypes with large age and large metalUcity. 



average (.2 distance to the true 7c. Each prototype method outperforms the 
regular parameter grid prototype bases, often by large margins, especially 
for K = 4:5. Between the different prototyping methods there does not ap- 
pear to be a clear winner, though diffusion X-means bases achieve the lowest 
or second-lowest MSE for 4 of the 5 parameters. means also achieves ac- 
curate estimates for each of the parameters, and always beats or ties the 
X-means-central estimates. Both USS and AA yield inaccurate estimates 
for all parameters except (logZ*)^, and cr*. SSS could not be run on such 
a large dictionary of SSPs. Overall, small bases achieve better estimates of 
log{t^)L,Av and 7c, but this likely will not be the case for real galaxies, 
whose SFHs are more complicated and diverse than the simulation prescrip- 
tion used. 

6. Analysis of SDSS galaxies. Prototyping methods are used to estimate 
the SFH parameters from the SDSS spectra of a set of 3046 galaxies in 
SDSS Data Release 6 [Adelman-McCarthy et al. (2008)]. For more detailed 
information about the data and preprocessing steps, see Richards et al. 
(2009a). In Figure 8 we plot the estimated log(t*)L versus (logZ*)^, for each 
galaxy using three basis choices: the regular parameter grid of Asari et al. 
(2007) (Asa07, K = 150), DKM with K = 45, and DKM with K = 150. 
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Fig. 7. Errors m physical parameter estimates for galaxy simulations using prototype 
techniques: K-means (KM), diffusion K-means (DKM), centroid K-means (KM-central) , 
USS, AA, and a regular parameter grid. MSEs are plotted for bases of size K = 10, 25, 45, 
100 and 150. The regular parameter grids are from Cid Fernandes et al. (2005) (K = 45) 
and Asari et al. (2007) (K = 150^. Each prototyping method finds more accurate SFH 
parameter estimates than the two regular parameter grids. 



There are several differences in the estimated (log Z^)i^ — log(t*) l relation 
for each basis. First, both diffusion i^T-means bases produce estimates that 
are tightly spread around an increasing trend while the Asa07 estimates are 
more diffusely spread around such a trend. The direction of discrepancy in 
the Asa07 estimates from the trend corresponds exactly with the direction 
of a well-known spectral degeneracy between old, metal-poor and young, 
metal-rich galaxies [Worthey (1994)]. This suggests that the observed vari- 
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Fig. 8. Estimates o/log(f,)L versus (log.Z,)L for a set of 3046 galaxies observed by 
the SDSS, estimated using STARLIGHT with three different prototype bases. From left to 
right, bases are as follows: regular parameter grid from Asari et al. (2007) with K — 150, 
diffusion K-means K = 45, and diffusion K-means K = 150. Estimates from diffusion 
K-means bases show much less spread in the direction of the well-known age-metallicity 
degeneracy in galaxy population synthesis studies. 

ability along this direction is not due to the physics of these galaxies, but 
rather is caused by confusion stemming from the choice of basis [in Richards 
et al. (2009a) we verified that diffusion X-means SFH estimates have a de- 
creased age-metallicity degeneracy, using simulated galaxy spectra]. Second, 
the K = 45 diffusion K-means basis estimates no young, metal-poor galaxies, 
whereas the other bases do. This suggests that this small number of pro- 
totypes is not sufficient to cover the parameter space; particularly, young, 
metal-poor SSPs have been neglected in the K = 45 diffusion A'-means ba- 
sis. Finally, the overall trend between log(t*)L versus (logZ*)^ differs sub- 
stantially between the regular grid and diffusion K-means basis, suggesting 
that SFH parameter estimates are sensitive to the choice of basis and that 
downstream cosmological inferences will depend heavily on the basis used. 

Recently, we have estimated the SFH parameters for all 781,692 galaxies 
in the SDSS DR7 [Abazajian (2009)] main sample or LRG sample. This 
subset of DR7 galaxies was chosen for analysis because it was targeted for 
spectroscopic observation, and thus has a well defined selection function 
[Strauss (2002)]. We estimated the parameters using STARLIGHT with a dif- 
fusion A-means basis of size K = 150. The computational routines took 
nearly 5 CPU years to analyze the entire data set, which includes prepro- 
cessing of the data, estimating the SFH parameters for each, and compiling 
the catalog of estimates. The computations were performed in parallel on 
the 1,000-core high-performance FLUX cluster at the University of Michi- 
gan. Results of this analysis are in preparation [Richards and Miller (2011)] 
and will be published shortly. These SFH estimates will be used to constrain 
cosmological models that concern the formation and evolution of galaxies 
and the history and fate of the Universe. 
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There is also ongoing work into approaches to quantifying the statisti- 
cal uncertainty in the resulting parameter estimates. This is a critical, but 
challenging, component. The basic approach to be employed will exploit the 
massive amount of data by inspecting the amount of variability in param- 
eter estimates in small neighborhoods in the space of galaxy spectra. An 
additional regression model will be fit, with the parameter estimates as the 
response, and the spectrum as the predictor. In previous work [Richards 
et al. (2009b) and Freeman et al. (2009)], we have fit models of exactly this 
type, using galaxy spectra or colors to predict redshift. As was the case in 
that work, we will smooth the parameter estimates in the high-dimensional 
space to obtain an estimator with lower variance. Equally important, this 
will yield a natural way of estimating the uncertainty in the estimator, by 
inspecting the variance of the residuals of the regression fit. 

7. Conclusions. We have introduced a prototyping approach for the com- 
mon class of parameter estimation problems where observed data are pro- 
duced as a constrained linear combination of theoretical model-produced 
components, and the target parameters are derived from the parameters 
in the signal model. The usual approach to this type of problem is to use 
models on a regular grid in parameter space. In this paper we have intro- 
duced approaches that use the properties of the theoretical data from the 
dictionary of components to estimate prototype bases. These approaches 
include: quantizing the component model data space using i('-means, select- 
ing prototypes uniformly over the space of theoretical component data, and 
estimating prototype bases that minimize the reconstruction error of the 
components. 

Our main findings are the following: 

• The quantization methods presented in this paper achieve better param- 
eter estimates than the approach of using prototypes from a regular pa- 
rameter grid, as shown in multiple simulations. The regularization that 
results from a reduced basis leads to reduced variance in the parameter es- 
timates, without sacrificing accuracy. This is the case because components 
with similar theoretical data will be indiscernible under the presence of 
noise, making it crucial that prototypes be spread out evenly in theoretical 
data space, inducing a large decrease in variance of the target parameter 
estimates. If bases are too small, then the parameter estimates suffer from 
large bias because important regions of model space are neglected. 

• Standard sparse coding methods are not appropriate for this class of prob- 
lem. Without the proper constraints, these methods do not find prototypes 
that are physically-plausible. Even with these constraints, these methods 
select prototypes around the boundary of the data distribution, which is 
good for data reconstruction but not for target parameter estimation. 
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• For a complicated problem in astrophysics — estimating the history of star 
formation for each galaxy in a large database — we obtain more accurate 
parameters (in simulations) using the model quantization approach than 
using regular parameter grids. When applied to the real data, these differ- 
ent prototyping approaches produce markedly different results, showing 
the importance of prototype basis selection. 

REFERENCES 

Abazajian, K. N. e. a. (2009). The seventh data release of the sloan digital sky survey. 

Astrophysical Journal Supplement Series 182 543-558. 
Adams, J. B., Smith, M. O. and Johnson, P. E. (1986). Spectral mixture modeling: 

A new analysis of rock and soil types at the Viking Lander 1 site. J. Geophys. Res. 91 

8098-8112. 

Adelman-McCarthy, J. K. et al. (2008). The sixth data release of the sloan digital sky 

survey. Astrophysical Journal Supplement Series 175 297-313. 
Allassonniere, S., Amit, Y. and Trouve, A. (2007). Towards a coherent statistical 

framework for dense deformable template estimation. J. R. Stat. Soc. Ser. B Stat. 

Methodol. 69 3-29. MR2301497 
AsARi, N. v.. Cm Fernandes, R., Stasinska, G., Torres-Papaqui, J. P., Ma- 

teus, a., Sodre, L., Schoenell, W. and Gomes, J. M. (2007). The history of 

star-forming galaxies in the sloan digital sky survey. Monthly Notices of the Royal 

Astronomical Society 381 263-279. 
Babak, S., Balasubramanian, R., Churches, D., Cokelaer, T. and Sathyapra- 

KASH, B. (2006). A template bank to search for gravitational waves from inspiralling 

compact binaries: L Physical models. Classical Quantum Gravity 23 5477-5504. 
Bayarri, M. J., Berger, J. O., Cafeo, J., Garcia-Donato, G., Liu, F., Palomo, J., 

Parthasarathy, R. J., Paulo, R., Sacks, J. and Walsh, D. (2007). Computer model 

validation with functional output. Ann. Statist. 35 1874-1906. MR2363956 
Bica, E. (1988). Population synthesis in galactic nuclei using a library of star clusters. 

Astronomy and Astrophysics 195 76-92. 
Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, 

Cambridge. MR2061575 
Bruzual, G. and Charlot, S. (2003). Stellar population synthesis at the resolution of 

2003. Monthly Notices of the Royal Astronomical Society 344 1000-1028. 
Cardelli, J. A., Clayton, G. C. and Mathis, J. S. (1989). The relationship between 

infrared, optical, and ultraviolet extinction. Astrophysical Journal 345 245-256. 
Chen, X. Y., Liang, Y. C, Hammer, P., Zhao, Y. H. and Zhong, G. H. (2009). Stellar 

population analysis on local infrared-selected galaxies. Astronomy and Astrophysics 495 

457-469. 

CiD Fernandes, R., Sodre, L., Schmitt, H. R. and Leao, J. R. S. (2001). A prob- 
abilistic formulation for empirical population synthesis: Sampling methods and tests. 
Monthly Notices of the Royal Astronomical Society 325 60-76. 

CiD Fernandes, R., Gu, Q., Melnick, J., Terlevich, E., Terlevich, R., Kunth, D., 
RODRIGUES Lacerda, R. and Joguet, B. (2004). The star formation history of Seyfert 
2 nuclei. Monthly Notices of the Royal Astronomical Society 355 273-296. 

CiD Fernandes, R., Mateus, A., Sodre, L., Stasinska, G. and Gomes, J. M. (2005). 
Semi-empirical analysis of Sloan Digital Sky Survey galaxies. L Spectral synthesis 
method. Monthly Notices of the Royal Astronomical Society 358 363-378. 



26 



RICHARDS, LEE, SCHAFER AND FREEMAN 



Clark, B. E., Bus, S. J., Rivkin, A. S., McConnochie, T., Sanders, J., Shah, S., 
HiROl, T. and Shepard, M. (2004). E-type asteroid spectroscopy and compositional 
modeling. J. Geophys. Res. 109 2001. 

CoiFMAN, R. R. and Lafon, S. (2006). Diffusion maps. Appl. Comput. Harmon. Anal. 
21 5-30. MR2238665 

Cutler, A. and Breiman, L. (1994). Archetypal analysis. Technometrics 36 338-347. 
MR1304898 

Freeman, P. E., Newman, J. A., Lee, A. B., Richards, .J. W. and Schafer, C. M. 

(2009). Photometric redshift estimation using spectral connectivity analysis. Monthly 

Notices of the Royal Astronomical Society 398 2012-2021. 
Geman, S. and Hwang, C.-R. (1982). Nonparametric maximum likelihood estimation by 

the method of sieves. Ann. Statist. 10 401-414. MR0653512 
Grant, M. and Boyd, S. (2010). CVX: Matlab Software for Disciplined Convex Pro- 
gramming, version 1.21. Available at http://cvxr.com/cvx. 
Grenander, U. (1981). Abstract Inference. Wiley, New York. MR0599175 
Hapke, B. and Wells, E. (1981). Bidirectional reflectance spectroscopy 2. Experiments 

and observations. J. Geophys. Res. 86 3055-3060. 
Lafon, S. and Lee, A. B. (2006). Diffusion maps and coarse-graining: A unified frame- 
work for dimensionality reduction, graph partitioning, and data set parameterization. 

IEEE Transactions on Pattern Analysis and Machine Intelligence 28 1393. 
Lawson, C. L. and Hanson, R. J. (1995). Solving Least Squares Problems. SIAM, 

Philadelphia, PA. MR1349828 
Lee, D. D. and Seung, H. S. (2001). Algorithms for non-negative matrix factorization. 

Adv. Neural Inf. Process. Syst. 556-562. 
Levy, S. and Steinberg, D. M. (2010). Computer experiments: A review. AStA Adv. 

Stat. Anal. 94 311-324. MR2753331 
Meinicke, p. and Ritter, H. (2002). Quantizing density estimators. Adv. Neural Inf. 

Process. Syst. 2 825-832. 
Obozinski, G., Wainwright, M. J., Jordan, M. I. (2011). Support union recovery in 

high-dimensional multivariate regression. Ann. Statist. 39 1-47. MR2797839 
Olshausen, B. a. et al. (1996). Emergence of simple-cell receptive field properties by 

learning a sparse code for natural images. Nature 381 607-609. 
Owen, B. ,J. and Sathyaprakash, B. (1999). Matched filtering of gravitational waves 

from inspiraling compact binaries: Computational cost and template placement. Phys. 

Rev. D 60 22002. 

Paatero, p. and Tapper, U. (1994). Positive matrix factorization: A non-negative factor 
model with optimal utilization of error estimates of data values. Environmetrics 5 111- 
126. 

Panter, B., Jimenez, R., Heavens, A. F. and Charlot, S. (2007). The star forma- 
tion histories of galaxies in the sloan digital sky survey. Monthly Notices of the Royal 
Astronomical Society 378 1550-1564. 

Pelat, D. (1997). A new method to solve stellar population synthesis problems with the 
use of a data base. Monthly Notices of the Royal Astronomical Society 284 365-375. 

Richards, J. W. and Miller, C. J. (2011). Star formation history estimates for SDSS 
DR6. Unpublished manuscript, Carnegie Mellon Univ., Pittsburgh, PA. 

Richards, J. W., Freeman, P. E., Lee, A. B. and Schafer, C. M. (2009a). Accurate 
parameter estimation for star formation history in galaxies using SDSS spectra. Monthly 
Notices of the Royal Astronomical Society 399 1044-1057. 

Richards, J. W., Freeman, P. E., Lee, A. B. and Schafer, C. M. (2009b). Exploiting 
low-dimensional structure in astronomical spectra. Astrophysical Journal 691 32-42. 



PROTOTYPE SELECTION 



27 



Roberts, D., Gardner, M., Church, R., Ustin, S., Scheer, G. and Green, R. (1998). 
Mapping chaparral in the Santa Monica Mountains using multiple endmember spectral 
mixture models. Remote Sensing of Environment 65 267-279. 

Sabuncu, M., Balci, S. and Golland, P. (2008). Discovering modes of an image popu- 
lation through mixture modeling. In Proceedings MICCAI 2008: Medical Image Com- 
puting and Computer-Assisted Intervention-MICCAI 381-389. Springer, Berlin. 

Santner, T. J., Williams, B. J. and Notz, W. 1. (2003). The Design and Analysis of 
Computer Experiments. Springer, New York. MR2 160708 

Stone, E. (2002). Exploring archetypal dynamics of pattern formation in cellular flames. 
Phys. D 161 163-186. MR1878533 

Strauss, M. A. e. a. (2002). Spectroscopic target selection in the sloan digital sky survey: 
The main galaxy sample. Astronomical Journal 124 1810-1824. 

WORTHEY, G. (1994). Comprehensive stellar population models and the disentanglement 
of age and metallicity effects. Astrophysical Journal Supplement Series 95 107-149. 

York, D. G. et al. (2000). The sloan digital sky survey; Technical summary. Astronomical 
Journal 120 1579-1587. 



J. W. Richards 

Department of Astronomy 

University of California, Berkeley 

601 Campbell Hall 

Berkeley, California 94720 

USA 

E-MAIL: jwrichar@stat.berkeley.edu 



A. B. Lee 

C. M. Schafer 

P. E. Freeman 

Department of Statistics 

Carnegie Mellon University 

5000 Forbes Avenue 

Pittsburgh, Pennsylvania 15213 

USA 



